43 research outputs found
Forecasting ozone threshold exceedances in urban background areas using supervised classification and easy-access information
Classification models to forecast exceedance of the ozone (O3) threshold established by European legislation are
rare in literature, as is the focus on background O3, with higher concentrations at city outskirts. This study
evaluated the performance of nine classifiers to forecast this threshold exceedance by background O3. Models
used five large hourly background O3 data sets (2006–2015), and included temporal features describing the O3
formation dynamic. Bagging and stacking ensembles of such classifiers and their cost of learning were also
evaluated. C5.0 and nnet classifiers achieved the best forecasting performance, even at imbalanced learning.
Bagging ensembles outperformed stacking approaches, although with little accuracy improvement as compared
to classifiers. The cost of learning evidenced similar performance results from reduced fractions of original data
sets. The use of these models to forecast background O3 threshold exceedances are encouraged due to the
performances obtained and to their easy reproducibilit
Estimation of Particulate Matter Contributions from Desert Outbreaks in Mediterranean Countries (2015–2018) Using the Time Series Clustering Method
North African dust intrusions can contribute to exceedances of the European PM10 and PM2.5 limit values and World Health Organisation standards, diminishing air quality, and increased mortality and morbidity at higher concentrations. In this study, the contribution of North African dust in Mediterranean countries was estimated using the time series clustering method. This method combines the non-parametric approach of Hidden Markov Models for studying time series, and the definition of different air pollution profiles (regimes of concentration). Using this approach, PM10 and PM2.5 time series obtained at background monitoring stations from seven countries were analysed from 2015 to 2018. The average characteristic contributions to PM10 were estimated as 11.6 ± 10.3 µg·m−3 (Bosnia and Herzegovina), 8.8 ± 7.5 µg·m−3 (Spain), 7.0 ± 6.2 µg·m−3 (France), 8.1 ± 5.9 µg·m−3 (Croatia), 7.5 ± 5.5 µg·m−3 (Italy), 8.1 ± 7.0 µg·m−3 (Portugal), and 17.0 ± 9.8 µg·m−3 (Turkey). For PM2.5, estimated contributions were 4.1 ± 3.5 µg·m−3 (Spain), 6.0 ± 4.8 µg·m−3 (France), 9.1 ± 6.4 µg·m−3 (Croatia), 5.2 ± 3.8 µg·m−3 (Italy), 6.0 ± 4.4 µg·m−3 (Portugal), and 9.0 ± 5.6 µg·m−3 (Turkey). The observed PM2.5/PM10 ratios were between 0.36 and 0.69, and their seasonal variation was characterised, presenting higher values in colder months. Principal component analysis enabled the association of background sites based on their estimated PM10 and PM2.5 pollution profiles
Estimation of Particulate Matter Contributions from Desert Outbreaks in Mediterranean Countries (2015-2018) Using the Time Series Clustering Method
North African dust intrusions can contribute to exceedances of the European PM10 and PM2.5 limit values and World Health Organisation standards, diminishing air quality, and increased mortality and morbidity at higher concentrations. In this study, the contribution of North African dust in Mediterranean countries was estimated using the time series clustering method. This method combines the non-parametric approach of Hidden Markov Models for studying time series, and the definition of different air pollution profiles (regimes of concentration). Using this approach, PM10 and PM2.5 time series obtained at background monitoring stations from seven countries were analysed from 2015 to 2018. The average characteristic contributions to PM10 were estimated as 11.6 +/- 10.3 mu g.m(-3) (Bosnia and Herzegovina), 8.8 +/- 7.5 mu g.m(-3) (Spain), 7.0 +/- 6.2 mu g.m(-3) (France), 8.1 +/- 5.9 mu g.m(-3) (Croatia), 7.5 +/- 5.5 mu g.m(-3) (Italy), 8.1 +/- 7.0 mu g.m(-3) (Portugal), and 17.0 +/- 9.8 mu g.m(-3) (Turkey). For PM2.5, estimated contributions were 4.1 +/- 3.5 mu g.m(-3) (Spain), 6.0 +/- 4.8 mu g.m(-3) (France), 9.1 +/- 6.4 mu g.m(-3) (Croatia), 5.2 +/- 3.8 mu g.m(-3) (Italy), 6.0 +/- 4.4 mu g.m(-3) (Portugal), and 9.0 +/- 5.6 mu g.m(-3) (Turkey). The observed PM2.5/PM10 ratios were between 0.36 and 0.69, and their seasonal variation was characterised, presenting higher values in colder months. Principal component analysis enabled the association of background sites based on their estimated PM10 and PM2.5 pollution profiles
Modelos de mixturas finitas para la caracterización y mejora de las redes de monitorización de la calidad del aire
Antecedentes Los planes de monitorización de la calidad del aire, en ocasiones, no son convenientemente actualizados en concordancia con las
cambiantes condiciones locales, repercutiendo en la información atmosférica que proporcionan, bien dejando de detectar nuevas fuentes de contaminación
o duplicando cierta información. Además, posibles mantenimientos deficientes del equipamiento de las redes de monitorización suponen a
aquel un inconveniente añadido. Para abodar estos aspectos, se ha recurrido a una combinación de métodos estadísticos para la optimización de los
recursos empleados en la monitorización, introduciendo nuevos criterios para su mejora.
Métodos Datos de monitorización de contaminantes clave como el monóxido de carbono (CO), dióxido de nitrógeno (NO2), ozono (O3), material
particulado (PM10) y dióxido de azufre (SO2) fueron obtenidos de 12 estaciones de monitorización de la calidad del aire en Sevilla (España). Un
total de 49 conjuntos de datos fueron modelizados mediante mixturas finitas gaussianas utilizando el algoritmo de esperanza-maximización (EM).
Para resumir estos 49 modelos, la media (μm) y coeficiente de variación (cvm) de cada mixtura fueron calculados, y a partir de ellos, se realizó un
análisis clúster jerárquico (ACJ) para estudiar el agrupamiento de las estaciones de acuerdo con estos estadísticos. El valor de los parámetros no
monitorizados en las estaciones de medición fueron imputados aplicando un algoritmo basado en bosques aleatorios, utilizando los valores de μm y
cvm conocidos. Posteriormente, el análisis de componentes principales (ACP) permitió comprender la relación intrínseca entre las estaciones de la
red, así como la concordancia en su clasificación. Todas las técnicas fueron aplicadas utilizando el software estadístico gratuito y de código abierto R.
Resultados y conclusiones Se ha analizado un ejemplo de atribución y contribución de fuentes utilizando la modelización mediante mixturas
finitas, y el potencial de estos modelos es propuesto para caracterizar tendencias de contaminación. Los estadísticos de la mixturas μm y cvm
representan su huella dactilar, y su empleo es nuevo en la caracterización de los modelos mixtos en el área de la gestión de la calidad del aire.
La técnica de imputación empleada ha permitido la estimación de valores de concentración de parámetros no monitorizados y el planteamiento de
nuevos esquemas de monitorización para esta red. El empleo posterior del ACP ha confirmado una clasificación errónea de una estación detectada
inicialmente mediante el ACJ.Background Existing air quality monitoring programs are, on occasion, not updated according to local, varying conditions and as such the monitoring
programs become non-informative over time, under-detecting new sources of pollutants or duplicating information. Furthermore, inadequate
maintenance may cause the monitoring equipment to be utterly deficient in providing information. To deal with these issues, a combination of
formal statistical methods is used to optimize resources for monitoring and to characterize the monitoring networks, introducing new criteria for
their refinement.
Methods Monitoring data were obtained on key pollutants such as carbon monoxide (CO), nitrogen dioxide (NO2), ozone (O3), particulate
matter (PM10) and sulfur dioxide (SO2) from 12 air quality monitoring sites in Seville (Spain) during 2012. A total of 49 data sets were fit to
mixture models of Gaussian distribution using the expectation-maximization (EM) algorithm. To summarize these 49 models, the mean (μm) and
coefficient of variation (cvm) were calculated for each mixture and carried out a hierarchical clustering analysis (HCA) to study the grouping of the
sites according to these statistics. To handle the lack of observational data from the sites with unmonitored pollutants, the missing statistical values
were imputed by applying the random forests technique and then later, a principal component analysis (PCA) was carried out to better understand
the relationship between the level of pollution and the classification of monitoring sites. All of the techniques were applied using free, open-source,
statistical software R.
Results and conclusions One example of source attribution and contribution is analyzed using mixture models and the potential for mixture
models is posed in characterizing pollution trends. The mixture statistics μm and cvm have proven to be a fingerprint for every model and this
work presents a novel use of it and represents a promising approach to characterizing mixture models in the air quality management discipline. The
imputation technique used is allowed for estimating the missing information from key unmonitored pollutants to gather information about unknown
pollution levels and to suggest new possible monitoring configurations for this network. Posterior PCA confirmed the misclassification of one site
detected with HCA.Universidad de Granada. Máster Universitario en Estadística Aplicad
Caracterización de la contaminación atmosférica debida a aportes antropogénicos y naturales mediante la aplicación de modelos de mixturas finitas, de Markov homogéneos y otras técnicas de minería de datos
Son cuantiosos los recursos científicos que se dirigen al estudio de las fuentes de emisión de contaminantes atmosféricos en las áreas urbanas. Este estudio puede ser cuantitativo, determinando la contribución de cada fuente a la contaminación ambiente, o cualitativo, para conocer más sobre la composición de las emisiones que afectan a los residentes en las ciudades. En los países mediterráneos, además, la contaminación causada por fenómenos naturales, como el transporte de polvo desde las regiones áridas del Norte de África, también es de primordial importancia. Entre los instrumentos fundamentales de los que se dispone para medir la contaminación atmosférica, se encuentran las redes de vigilancia de la calidad del aire, integradas por estaciones de medida que se sitúan tanto en ambientes urbanos como en el medio rural, con el fin de determinar e informar sobre la calidad del aire que nos afecta. En las ciudades, algunas de estas estaciones de medida se sitúan en emplazamientos fuera del alcance directo de fuentes de emisión, para determinar la contaminación de fondo urbano, representativa de la exposición a la que la población se expone de forma general. Esta tesis ha tenido como objetivos los siguientes:
1. La caracterización exhaustiva de la contaminación atmosférica en entornos urbanos y rurales empleando la información obtenida de las redes de vigilancia de la calidad del aire, desarrollando para ello una metodología general para la gestión eficiente de las redes de monitorización.
2. Mejorar la metodología existente para la estimación del aporte de polvo transportado por las masas de aire cálido desde las regiones norteafricanas.
3. Comparar los niveles de contaminación atmosférica entre diferentes redes de monitorización urbanas, sin influencia industrial y localización geográfica distinta, proponiendo para ello una metodología con la que caracterizar la contaminación atmosférica ambiental y de fondo.
Los resultados de esta tesis, apoyados en cada uno de estos objetivos, están avalados, respectivamente, por las siguientes publicaciones:
1. Gómez-Losada, Á., Lozano-García, A., Pino-Mejías, R., Contreras-González, J. 2014. Finite mixture models to characterize and refine air quality monitoring networks. Science of the Total Environment, 485-486: 292-9.
2. Gómez-Losada, Á.,Pires,J.C.M.,Pino-Mejías,R.2015.Time series clustering for estimating particulate matter contributions and its use in quantifying impacts from deserts. Atmospheric Environment, 117: 271-81.
3. Gómez-Losada, Á., Pires, J.C.M., Pino-Mejías, R. 2016. Characterization of background air pollution exposure in urban environments using a metric based on Hidden Markov Models. Atmospheric Environment, 127: 255-61.A wealth of scientific resources have been dedicated to the study of the sources of pollutant emissions to air in urban areas. Such studies may be quantitative, determining the contribution of each source of environmental pollution, or they may be qualitative, providing insight into the makeup of the emissions that afect a city's inhabitants. In Mediterranean countries, contamination may also be the result of natural phenomenon, such as the ow of dust from the arid regions of North Africa, and are therefore of primary importance as well. The ow of particulate matter transcends these geographic areas, passing over the Atlantic Ocean and reaching the American coasts. Among the fundamental tools available for measuring air pollution are the air-quality monitoring networks, made up of monitoring stations located both in urban areas and rural environments, with the aim of providing information on the air quality that afects us. In cities, some of these monitoring stations are located on sites that are outside of the direct range of emission sources and thus the determination of the urban background pollution, which is indicative of the generalised exposure of the population to air pollution, is possible. The objectives of this thesis were the following: To exhaustively characterise the air pollutants in urban and rural areas using the information obtained from the air-quality monitoring networks. To this end, a general methodology was developed to efciently manage the monitoring networks; To improve the existing methodology used to estimate the contribution of dust originating in the North African region that is carried by waves of warm air; To compare the air-pollution levels between the diferent urban-monitoring networks unafected by industrial pollution, and between diferent geographic locations, proposing a methodology that can be used to characterise environmental and background air pollution. In order to fulil the First objective, the primary and secondary air-pollution monitoring data were modelled using finite mixture models. Based on the calculation of the first and second moments of these mixtures, hierarchical cluster analysis, imputation using random forests, and principal component analysis were used. This methodological approximation enabled the detection of duplications within the parameters monitored by the monitoring stations, thus allowing these networks to be reconfigured and enabling the economic resources invested in them to be optimised.
For the second objective, hidden Markov models (HMM) were introduced and the diferent regimes or PM10 concentration profiles were described in some of the time series (TS) studied, enabling an estimation of the contribution of each of the profiles to environmental pollution. The new method proposed for estimating the natural contribution of PM10 improves upon the reference methodology used in the European Union (monthly moving 40th percentile method) in three ways - it avoids the use of empirical approximation, it applies modelling that is especially designed for the treatment of time-series data, and it allows for obtaining a con_dence interval for the contribution estimations for PM10. For the third objective, hidden Markov models were also used, in this case to define and characterise the environmental and background pollution caused by primary air pollution in diferent urban areas of diferent cities. The attributable fraction for background air pollution was estimated using a new procedure based on the first concentration profile defined by the HMMs in the TS. The ratio and diference between environmental and background concentrations were also studied
Automatic Eligibility of Sellers in an Online Marketplace: A Case Study of Amazon Algorithm
Purchase processes on Amazon Marketplace begin at the Buy Box, which represents the buy click process through which numerous sellers compete. This study aimed to estimate empirically the relevant seller characteristics that Amazon could consider featuring in the Buy Box. To that end, 22 product categories from Italy’s Amazon web page were studied over a ten-month period, and the sellers were analyzed through their products featured in the Buy Box. Two different experiments were proposed and the results were analyzed using four classification algorithms (a neural network, random forest, support vector machine, and C5.0 decision trees) and a rule-based classification. The first experiment aimed to characterize sellers unspecifically by predicting their change at the Buy Box. The second one aimed to predict which seller would be featured in it. Both experiments revealed that the customer experience and the dynamics of the sellers’ prices were important features of the Buy Box. Additionally, we proposed a set of default features that Amazon could consider when no information about sellers was available. We also proposed the possible existence of a relationship or composition among important features that could be used for sellers to be featured in the Buy Box
Time series clustering for estimating particulate matter contributions and its use in quantifying impacts from deserts
Source apportionment studies use prior exploratory methods that are not purpose-oriented and receptor
modelling is based on chemical speciation, requiring costly, time-consuming analyses. Hidden Markov
Models (HMMs) are proposed as a routine, exploratory tool to estimate PM10 source contributions. These
models were used on annual time series (TS) data from 33 background sites in Spain and Portugal. HMMs
enable the creation of groups of PM10 TS observations with similar concentration values, defining the
pollutant's regimes of concentration. The results include estimations of source contributions from these
regimes, the probability of change among them and their contribution to annual average PM10 concentrations. The annual average Saharan PM10 contribution in the Canary Islands was estimated and
compared to other studies. A new procedure for quantifying the wind-blown desert contributions to
daily average PM10 concentrations from monitoring sites is proposed. This new procedure seems to
correct the net load estimation from deserts achieved with the most frequently used method
Modelling background air pollution exposure in urban environments: Implications for epidemiological research
Background pollution represents the lowest levels of ambient air pollution to which the population is
chronically exposed, but few studies have focused on thoroughly characterizing this regime. This study
uses clustering statistical techniques as a modelling approach to characterize this pollution regime while
deriving reliable information to be used as estimates of exposure in epidemiological studies. The background levels of four key pollutants in five urban areas of Andalusia (Spain) were characterized over an
11-year period (2005e2015) using four widely-known clustering methods. For each pollutant data set,
the first (lowest) cluster representative of the background regime was studied using finite mixture
models, agglomerative hierarchical clustering, hidden Markov models (hmm) and k-means. Clustering
method hmm outperforms the rest of the techniques used, providing important estimates of exposures
related to background pollution as its mean, acuteness and time incidence values in the ambient air for
all the air pollutants and sites studied
A novel approach to forecast urban surface-level ozone considering heterogeneous locations and limited information
Surface ozone (O3) is considered an hazard to human health, affecting vegetation crops and ecosystems.
Accurate time and location O3 forecasting can help to protect citizens to unhealthy exposures when high levels
are expected. Usually, forecasting models use numerous O3 precursors as predictors, limiting the reproducibility
of these models to the availability of such information from data providers. This study introduces a 24 h-ahead
hourly O3 concentrations forecasting methodology based on bagging and ensemble learning, using just two
predictors with lagged O3 concentrations. This methodology was applied on ten-year time series (2006–2015)
from three major urban areas of Andalusia (Spain). Its forecasting performance was contrasted with an algorithm
especially designed to forecast time series exhibiting temporal patterns. The proposed methodology outperforms
the contrast algorithm and yields comparable results to others existing in literature. Its use is encouraged due to
its forecasting performance and wide applicability, but also as benchmark methodology
A data science approach for spatiotemporal modelling of low and resident air pollution in Madrid (Spain): Implications for epidemiological studies
Model developments to assess different air pollution exposures within cities are still a key challenge in environmental epidemiology. Background air pollution is a long-term resident and low-level concentration pollution difficult to quantify, and to which population is chronically exposed. In this study, hourly time series of four key air pollutants were analysed using Hidden Markov Models to estimate the exposure to background pollution in Madrid, from 2001 to 2017. Using these estimates, its spatial distribution was later analysed after combining the interpolation results of ordinary kriging and inverse distance weighting. The ratio of ambient to background pollution differs according to the pollutant studied but is estimated to be on average about six to one. This methodology is proposed not only to describe the temporal and spatial variability of this complex exposure, but also to be used as input in new modelling approaches of air pollution in urban areas. (c) 2018 The Author